Today we are happy to announce the release of version 0.9.3. This release focuses mainly on the python bindings and comes with the new headless scripting mode! Several new classes and methods have also been added in the meantime, making the python bindings as powerful as the user interface. In addition, several other improvements have been made to the software:
- Added parsing and unpacking support for firmware images: JFFS2, SquashFS and UImage
- Better Rust support: CFG reconstruction update and a RUST-specific string extraction algorithm
- Improved the Yara module: OpenSSL-based attributes (e.g. pe.number_of_signatures) are now available
- Various user interface improvements
- A few bugs have been fixed (thank you again for forwarding them !)
- .. and the usual doc / anomalies / yara signatures updates
Also since the headless mode has been added for paid users, we've thought that it would be nice to move one of the paid features into the lite version. So, lite users can now enjoy fast multithreaded analysis too! Note that you main need to increase the Number of threads option in Edit > Preferences > Analysis setup.
Python, python, python
New headless mode
With version 0.9.3 we have bundled Malcat's binary analysis into a python module. The module is available to full & pro users. Using the module you can use Malcat's code and data analyses without having to run the GUI (aka headless scripting). That's the way to go if you want to process large amount of files. See the user manual about how to install and use it.
We think it's a pretty cool addition to Malcat and it adds many possibilities:
- Export/import analysis details to/from other code analysis tools
- Enable batch file analysis to solve complex problems: malware classification, detection, etc.
- Write command line tools
While the python module is still in beta, we're very happy about this change and chances are it will bring many great functionalities in the future.
Renaming
Before the headless mode, python scripting in Malcat was limited to the script editor window. Thus it was not a big deal that Malcat's bindings module was (stupidly) named bindings
since it would only be used from within Malcat.
But with the addition of the headless python module, this had to change. In particular, two non-backward compatible renamings took place:
- Malcat's python module is now named
malcat
, which makes a lot more sense - as a consequence, the current analysis object in the script editor has now been renamed from
malcat
toanalysis
All of Malcat's parsers, templates, scripts and anomalies have been rewritten accordingly. If you wrote you own scripts, you will have to update them too (well, do two string replacement, that should be ok).
Additional bindings
A few additional python bindings have been added in version 0.9.3. They focus on user annotations and editing:
- User comments now have bindings: you can add, edit and navigate through user comments
- User highlighted regions (like comments but on data ranges and with colors) now have bindings: you can add, edit and navigate through them
- You can now force and unforce functions definition in python
- You can now force and unforce custom types definition in python
- You can now view and dump virtual files from python
- You can now override the detected file type from python
- You can now override the detected CPU architecture from python
- Helper functions have been added to pretty print or convert addresses
- [Headless mode] You can analyse arbitrary files or bytes buffer via the malcat.analyse() function
- You can invalidate and rerun analyses (useful if you have edited the files or added user annotations)
- You can apply templates to an analysis object to generate a report
- You have access to the analysis undo/redo manager from python
- You can load and save projects to disk from python
While the python bindings are not 100% complete, they should be now almost as powerful as the user interface. If you want to see them in action, have a look at the new command-line tool in <malcat dir>/bin/malcat.report.py
: a simple tool to generate a textual report for a file using the template() method.
File formats
Firmware support (JFSS2, SquashFS and UImage)
If there is one domain when one encounters unknown binary files in larges quantities, it is the subtle art of firmware reversing. In order to help our fellow hardware reversers, we have added added support for the three most common images formats:
- UImage is more a container than a file system, but it is often used in firmware nonetheless. Malcat can identify the container and extract its content, provided a lzo/lzma/gzip/bzip2 compression is used.
- JFFS2 is a log-structured file system designed for use on flash devices in embedded systems. Malcat supports all LSB file systems (if you have a MSB one, please send it to us), and can unpack files in-app, provided lzo/lzma/rtime/zlib compression is used.
- SquashFS is a compressed read-only file system for Linux. Malcat can parse it, its different streams and can unpack files in-app, provided lzo/lzma/xz compression is used.
And that is only a start! We feel that Malcat could be useful in this area, so you are likely to see additional improvements in the future. For instance, ARM/MIPS disassembly support, in order to have a look at all these extracted programs.
Note that like with all other parsers in Malcat, when a format is supported, it means that you will be able to see its internal structures and also identify it when embedded inside larger files (aka file carving).
Improved ZIP unpacking
The ZIP parser has seen some minor improvements too:
- The parser should now run about 30% faster, which is always nice to have.
- We have fixed a small bug which prevented unpacking files compressed in stream mode (where the compressed size is given in a later
DataDescriptor
entry. - We have added (basic) support for AES-encrypted ZIP (
WzAES
compression method).
Note that since the python zipfile
module lacks support for AES-encrypted ZIP archives, unpacking is done manually in python and we only support AES+deflate or AES+bzip2 combos. Anyway, it should be just enough to open password-protected ZIP files coming for malware bazaar.
Did you know: Malcat can automatically open in-app files from a password-protected ZIP archive if the password is either "infected", "virus" or "malware". No need to unpack them on disk! This is also true for 7Z archives, provided you have the py7zr python library installed.
Better PE debug info parsing
We have made a small improvement to the PE parser: now all debug information are parsed
Also a video from @struppigel made me aware of the REPRO
debug info type. This type of debug info is now parsed correctly and additionally, all PE timestamp-based anomalies are now set to silent when REPRO
debug entries are found.
Rust
Better CFG reconstruction
This is not a big news: Rust usage is increasing, and it means that we're starting to see it used in malicious software too. While Rust-compiled binaries do not differ a lot from say Golang binaries, there were a few minor differences which annoyed our CFG reconstruction algorithm.
For instance, noreturn calls are followed by an UD2
instruction, which is an artifact of the unreachable
instruction in LLVM. Well, Malcat can now recognize this call pattern as a noreturn
call. A couple of other LLVM-specific artifacts are now better recognized by the CFG reconstruction algorithm.
String extraction heuristics
Similar to Golang, Rust compiled programs do not use standard null-terminated strings, but instead relies on (string pointer, string size)
pairs. These pairs can be located in arrays, or more annoyingly hardcoded directly in the compiled code. To make things works, strings are often stored side by side with no delimiter between strings, which make the standard linear-sweep search algorithm useless when trying to reconstruct Rust strings.
In order to have better string identification in Malcat for Rust programs, we've added a set of new string extractions heuristics targeted at the most common string loading patterns found in Rust. These heuristics is still in their early days, but you should benefit from much better strings extraction from now on.
Yara
Crypto functionalities
We are now linking Yara against the OpenSSL library. While it adds more than 2 MBs to the library size, it enables crypto-related features in Yara rules, like for instance pe.number_of_signatures
. If you had to deactivate a few of your rules because of such fields, give it a new try!
Improved Yara dialog
The new Yara rule
dialog has been improved a bit: you can now override in which Yara file the new rule will be created. Before it was only possible to add rule to the currently opened Yara file.
We have also added a new context menu action for strings, bytes range and disassembly: add to new Yara rule
. This lets you add the currently selected string/bytes/code to a new Yara rule, saving you a few clicks.
Full changelog
Here is the complete changelog of this release:
● The lite version can now use multithreaded analysis!
● Python:
- The current analysis object in scripts has been renamed "analysis" (was "malcat")
- Renamed bindings module to "malcat" (was "bindings")
- A new python headless mode was added to full & pro versions! You can now import the malcat module from any python interpreter & perform batch analyses!
- Added "malcat.analyse()" method to the malcat module in headless mode
- You can now view and edit user comments from python (analysis.comments)
- You can now force/unforce function starts from python (analysis.fn.(un)force)
- You can now set custom data types from python (analysis.struct.(un)force)
- You can now view and edit user highlighted regions (analysis.highlights)
- You can now view and dump virtual files from python (analysis.vfiles)
- You can now override the detected file type from python (analysis.type = ...)
- You can now override the CPU architecture from python (analysis.architecture = ...)
- Added methods to drive the analysis (analysis.invalidate, analysis.run)
- Made bindings for the analysis error log (analysis.log, analysis.status, analysis.last_error, analysis.failed)
- You can now load and save Malcat projects (with all user modifications) from python (analysis.load/save)
- Added helper functions for address translation and output (analysis.ppa, analysis.v2a, etc.)
- You can now apply templates (.tpl files) to an analysis from python (analysis.template)
● File parsers:
- Added support for UImage archive format (with in-app unpacking), often found in firmwares. Exotic compression algorithms are not supported.
- Added support for JFFS2 file systems (with in-app file extraction), often found in firmwares. Exotic compression algorithms are not supported.
- Added support for SquashFS file systems (with in-app file extraction), often found in firmwares. Exotic compression algorithms are not supported.
- [PE] Improved debug info parser: now parses all debug info structures. Correctly interprets repro entries (thx @strupigel's video)
- [PE] Added parser for bound imports
- [FAT12/16/32] Ignore deleted entries in directories
- [VHD] Proper handle of hollow dynamic drives
- [ZIP] 30% performance optimisation
- [ZIP] (Very) basic support for AES-password-protected archives. Should be just enough to open malware bazaar's files directly from within malcat.
- [7Z] Automatic unpacking of password-protected archives if the passord is "infected", "malware" or "virus" (note that you need the py7zr library installed)
● Yara:
- Added OpenSSL library to the Yara scanner: crypto-related fields such as pe.number_of_signatures should now work
- You can now override the destination file when creating a new Yara rule
- Added "Add to new Yara rule" context menu action to selection, strings and disassembly
- Give focus and proper cursor position in Yara editor after "Add to (new) Yara rule" context menu action
● Rust:
- Added support for Rust's final function call pattern (should help with CFG reconstruction)
- Added Rust string analysis
● Transforms:
- Added JS beautify transform (requires jsbeautifier lib)
- moved all "obfuscation" transforms to the text category
● User annotation:
- You can now add custom annotations (custom text) using the selection context menu. Useful for screenshots and note taking.
- Undo/redo support
- Saved with project
- Preview control
- Hit a/A to jump to next/previous user annotation
● User interface:
- In the structure view, also show the extended context menu (including xrefs) for selected fields
- Reduced the size of the transform dialog to fit in smaller resolution screens
- Optimized redraw speed of structures tree
- Display a "bell" icon in case of warnings during analysis in the status bar
- Clicking on the icon in the statusbar brings you to the output log window (script editor view)
- Source code viewer now has wordwrap enabled
- Using "Select All" (Ctrl+A) command in the script view now selects all text in either the script editor or script output window (depending on who has focus)
- Using "Select All" (Ctrl+A) command in the decompiler view now selects the C code of the current function
- Using "Select All" (Ctrl+A) command in the corpus view now selects all matching files
- You can now select & copy multiple items in the corpus view list
- Files in the Virtual File System tab are now sorted by name
- The summary view has a new "Type" column that displays the current identified file type with an icon
- Added "open" and "dump" actions to the string context menu. They convert strings to utf-8 beforehand
- Library functions (e.g. FLIRT-identified fns) in symbol view are highlighted using the "DEBUG" highlighting color
- Hexadecimal number display shortcut now changed to Ctrl+Shift+D: Ctrl+D should now properly duplicate the current line in all scintilla-based editor windows under Linux
- Changing the number of threads for the analysis in the options does not require an app restart anymore
- Optimized augmented scroll bar redraw performances when displaying large complex files
- Use c/C in data view to jump to the next identified constant, use r/R to jump to the next string, use y/Y to the next Yara string match
● Bug fixing:
- [.NET] Fixed an issue in .NET class parser where the last field of the FieldTable would not be parsed
- Default syntax highlighting for text files would only consider lower-case file extensions
- Better validation for python conversion of DosDate and DosDateTime fields
- In some cases, long binary stack strings with no single ascii character were not detected
- Added some extra vertical space to big file dialog (thx @Squiblydoo)
- [IMPHASH] Malcat should now compute imphash exactly like pefile, using the same outdated ordinals list, for 100% backward-compatibility (thx @Marco)
- [ZIP] Fixed zip extractor not being able to unpack files packed in stream mode
- [PE] Fixed edge case where section gaps were incorrectly computed
- [LINUX] Fixed int overflow error in the entropy analysis for FILE > 4Gb
- Renaming a function is disassembly or decompiler view would not display the new name immediately in some cases